Normalization and microbial differential abundance strategies depend upon data characteristics
نویسندگان
چکیده
BACKGROUND Data from 16S ribosomal RNA (rRNA) amplicon sequencing present challenges to ecological and statistical interpretation. In particular, library sizes often vary over several ranges of magnitude, and the data contains many zeros. Although we are typically interested in comparing relative abundance of taxa in the ecosystem of two or more groups, we can only measure the taxon relative abundance in specimens obtained from the ecosystems. Because the comparison of taxon relative abundance in the specimen is not equivalent to the comparison of taxon relative abundance in the ecosystems, this presents a special challenge. Second, because the relative abundance of taxa in the specimen (as well as in the ecosystem) sum to 1, these are compositional data. Because the compositional data are constrained by the simplex (sum to 1) and are not unconstrained in the Euclidean space, many standard methods of analysis are not applicable. Here, we evaluate how these challenges impact the performance of existing normalization methods and differential abundance analyses. RESULTS Effects on normalization: Most normalization methods enable successful clustering of samples according to biological origin when the groups differ substantially in their overall microbial composition. Rarefying more clearly clusters samples according to biological origin than other normalization techniques do for ordination metrics based on presence or absence. Alternate normalization measures are potentially vulnerable to artifacts due to library size. Effects on differential abundance testing: We build on a previous work to evaluate seven proposed statistical methods using rarefied as well as raw data. Our simulation studies suggest that the false discovery rates of many differential abundance-testing methods are not increased by rarefying itself, although of course rarefying results in a loss of sensitivity due to elimination of a portion of available data. For groups with large (~10×) differences in the average library size, rarefying lowers the false discovery rate. DESeq2, without addition of a constant, increased sensitivity on smaller datasets (<20 samples per group) but tends towards a higher false discovery rate with more samples, very uneven (~10×) library sizes, and/or compositional effects. For drawing inferences regarding taxon abundance in the ecosystem, analysis of composition of microbiomes (ANCOM) is not only very sensitive (for >20 samples per group) but also critically the only method tested that has a good control of false discovery rate. CONCLUSIONS These findings guide which normalization and differential abundance techniques to use based on the data characteristics of a given study.
منابع مشابه
Normalization of metatranscriptomic and metaproteomic data for differential gene expression analyses: The importance of accounting for organism abundance
Normalization of metatranscriptomic and metaproteomic data for 1 differential gene expression analyses: The importance of accounting 2 for organism abundance 3 4 5 Author: Manuel Kleiner 6 7 Affiliations: 8 Energy Bioengineering and Geomicrobiology Group, Department of Geoscience, University of Calgary, 9 Calgary, Canada 10 Department of Plant and Microbial Biology, North Carolina State Univers...
متن کاملA robust approach for identifying differentially abundant features in metagenomic samples
MOTIVATION The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales a...
متن کاملRelation between students’ use of learning and study strategies and their academic and personal characteristics in Mashad University of Medical Sciences, 1999
Introduction. As, the importance of learning and study strategies in fostering academic achievement, which has generated a demand for assessing these behavior and because of the lack of information about the strategies use among college students, this study was determined to explore the learning and study strategies of medical, dentistry and pharmaceutics college students of Mashad University o...
متن کاملEffects of Thyme Essential Oil and Disodium Fumarate on Ruminal Fermentation Characteristics, Microbial Population and Nutrient Flow in a Dual Flow Continuous Culture System
The aim of the present study was to investigate the effects of di-sodium fumarate (DSF) and thyme essential oil (TEO) solely and simultaneously on ruminal fermentation properties and microbial abundance. A dual-flow continuous culture system (DFCC) with eight 1400-mL fermenters was used in a period of 12 d that divided to 9 d for adaptation and 3 d for sampling. Fermenters were fed 100 g d...
متن کاملRobust statistical methods for differential abundance analysis of metagenomics data
This document outlines my 2011-2012 AMSC project for the 663/664 course series and in particular the mid-year progress. It is an ever evolving document. The project is to develop Metastats 2.0, a software package analyzing metagenomic data. We propose two major extensions and modifications to the Metastats software and the underlying statistical methods. The first extension of Metastats is a mi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 5 شماره
صفحات -
تاریخ انتشار 2017